AITopics | constitutional ai

Collaborating Authors

constitutional ai

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

AI's Euclid's Elements Moment: From Language Models to Computable Thought

Fang, Xinmin, Tao, Lingfeng, Li, Zhengxiong

arXiv.org Artificial IntelligenceJul-11-2025

This paper presents a comprehensive five-stage evolutionary framework for understanding the development of artificial intelligence, arguing that its trajectory mirrors the historical progression of human cognitive technologies. We posit that AI is advancing through distinct epochs, each defined by a revolutionary shift in its capacity for representation and reasoning, analogous to the inventions of cuneiform, the alphabet, grammar and logic, mathematical calculus, and formal logical systems. This "Geometry of Cognition" framework moves beyond mere metaphor to provide a systematic, cross-disciplinary model that not only explains AI's past architectural shifts-from expert systems to Transformers-but also charts a concrete and prescriptive path forward. Crucially, we demonstrate that this evolution is not merely linear but reflexive: as AI advances through these stages, the tools and insights it develops create a feedback loop that fundamentally reshapes its own underlying architecture. We are currently transitioning into a "Metalinguistic Moment," characterized by the emergence of self-reflective capabilities like Chain-of-Thought prompting and Constitutional AI. The subsequent stages, the "Mathematical Symbolism Moment" and the "Formal Logic System Moment," will be defined by the development of a computable calculus of thought, likely through neuro-symbolic architectures and program synthesis, culminating in provably aligned and reliable AI that reconstructs its own foundational representations. This work serves as the methodological capstone to our trilogy, which previously explored the economic drivers ("why") and cognitive nature ("what") of AI. Here, we address the "how," providing a theoretical foundation for future research and offering concrete, actionable strategies for startups and developers aiming to build the next generation of intelligent systems.

artificial intelligence, machine learning, natural language, (17 more...)

arXiv.org Artificial Intelligence

2506.2308

Country: North America > United States > Colorado > Denver County > Denver (0.14)

Genre: Research Report (0.40)

Industry:

Health & Medicine (0.93)
Law (0.68)
Transportation (0.68)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Rule-Based Reasoning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Expert Systems (1.00)
Information Technology > Artificial Intelligence > Natural Language (1.00)
(2 more...)

Add feedback

Wide Reflective Equilibrium in LLM Alignment: Bridging Moral Epistemology and AI Safety

Brophy, Matthew

arXiv.org Artificial IntelligenceJun-3-2025

As large language models (LLMs) become more powerful and pervasive across society, ensuring these systems are beneficial, safe, and aligned with human values is crucial. Current alignment techniques, like Constitutional AI (CAI), involve complex iterative processes. This paper argues that the Method of Wide Reflective Equilibrium (MWRE) -- a well-established coherentist moral methodology -- offers a uniquely apt framework for understanding current LLM alignment efforts. Moreover, this methodology can substantively augment these processes by providing concrete pathways for improving their dynamic revisability, procedural legitimacy, and overall ethical grounding. Together, these enhancements can help produce more robust and ethically defensible outcomes. MWRE, emphasizing the achievement of coherence between our considered moral judgments, guiding moral principles, and relevant background theories, arguably better represents the intricate reality of LLM alignment and offers a more robust path to justification than prevailing foundationalist models or simplistic input-output evaluations. While current methods like CAI bear a structural resemblance to MWRE, they often lack its crucial emphasis on dynamic, bi-directional revision of principles and the procedural legitimacy derived from such a process. While acknowledging various disanalogies (e.g., consciousness, genuine understanding in LLMs), the paper demonstrates that MWRE serves as a valuable heuristic for critically analyzing current alignment efforts and for guiding the future development of more ethically sound and justifiably aligned AI systems.

large language model, machine learning, natural language, (17 more...)

arXiv.org Artificial Intelligence

2506.00415

Country: North America > United States (0.28)

Genre: Research Report (1.00)

Industry:

Health & Medicine (1.00)
Law (0.68)
Government (0.68)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.94)

Add feedback

Toward Responsible Federated Large Language Models: Leveraging a Safety Filter and Constitutional AI

Noh, Eunchung, Baek, Jeonghun

arXiv.org Artificial IntelligenceFeb-23-2025

Recent research has increasingly focused on training large language models (LLMs) using federated learning, known as FedLLM. However, responsible AI (RAI), which aims to ensure safe responses, remains underexplored in the context of FedLLM. In FedLLM, client data used for training may contain harmful content, leading to unsafe LLMs that generate harmful responses. Aggregating such unsafe LLMs into the global model and distributing them to clients may result in the widespread deployment of unsafe LLMs. To address this issue, we incorporate two well-known RAI methods into FedLLM: the safety filter and constitutional AI. Our experiments demonstrate that these methods significantly enhance the safety of the LLM, achieving over a 20% improvement on AdvBench, a benchmark for evaluating safety performance.

fedllm, red response, safety filter, (13 more...)

arXiv.org Artificial Intelligence

2502.16691

Country:

Asia > Japan > Honshū > Kantō > Tokyo Metropolis Prefecture > Tokyo (0.14)
North America > United States > Virginia (0.04)
Europe > Romania > Sud - Muntenia Development Region > Giurgiu County > Giurgiu (0.04)
(2 more...)

Genre: Research Report > New Finding (0.68)

Industry: Information Technology > Security & Privacy (1.00)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.70)

Add feedback

C3AI: Crafting and Evaluating Constitutions for Constitutional AI

Kyrychenko, Yara, Zhou, Ke, Bogucka, Edyta, Quercia, Daniele

arXiv.org Artificial IntelligenceFeb-21-2025

Constitutional AI (CAI) guides LLM behavior using constitutions, but identifying which principles are most effective for model alignment remains an open challenge. We introduce the C3AI framework (\textit{Crafting Constitutions for CAI models}), which serves two key functions: (1) selecting and structuring principles to form effective constitutions before fine-tuning; and (2) evaluating whether fine-tuned CAI models follow these principles in practice. By analyzing principles from AI and psychology, we found that positively framed, behavior-based principles align more closely with human preferences than negatively framed or trait-based principles. In a safety alignment use case, we applied a graph-based principle selection method to refine an existing CAI constitution, improving safety measures while maintaining strong general reasoning capabilities. Interestingly, fine-tuned CAI models performed well on negatively framed principles but struggled with positively framed ones, in contrast to our human alignment results. This highlights a potential gap between principle design and model adherence. Overall, C3AI provides a structured and scalable approach to both crafting and evaluating CAI constitutions.

anthropic choose, constitution, human value, (13 more...)

arXiv.org Artificial Intelligence

doi: 10.1145/3696410.3714705

2502.15861

Country:

Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.14)
North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)
Europe > United Kingdom > England > Nottinghamshire > Nottingham (0.14)
(7 more...)

Genre:

Research Report > New Finding (0.93)
Overview (0.87)

Industry:

Law > Civil Rights & Constitutional Law (1.00)
Government (0.93)
Information Technology > Security & Privacy (0.92)
Health & Medicine > Consumer Health (0.67)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.69)

Add feedback

How Effective Is Constitutional AI in Small LLMs? A Study on DeepSeek-R1 and Its Peers

Menke, Antonio-Gabriel Chacón, Tan, Phan Xuan

arXiv.org Artificial IntelligenceFeb-1-2025

Recent incidents highlight safety risks in Large Language Models (LLMs), motivating research into alignment methods like Constitutional AI (CAI). This paper explores CAI's self-critique mechanism on small, uncensored 7-9B parameter models: DeepSeek-R1, Gemma-2, Llama 3.1, and Qwen2.5. Using HarmBench, we demonstrate that while all models showed capacity for harm reduction through self-critique, effectiveness varied significantly, with DeepSeek-R1's explicit reasoning process yielding superior results. These findings suggest that CAI-inspired prompting strategies can enhance safety in resource-constrained models, though success depends on the model's capacity for harm detection.

constitutional ai, deepseek-r1, language model, (15 more...)

arXiv.org Artificial Intelligence

2503.17365

Country:

Asia > Japan > Honshū > Kantō > Tokyo Metropolis Prefecture > Tokyo (0.14)
North America > United States > New York (0.04)
Europe > Germany (0.04)

Genre: Research Report > New Finding (0.34)

Industry:

Law > Civil Rights & Constitutional Law (0.42)
Law Enforcement & Public Safety > Crime Prevention & Enforcement (0.36)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Natural Language > Chatbot (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Unlocking Transparent Alignment Through Enhanced Inverse Constitutional AI for Principle Extraction

Henneking, Carl-Leander, Beger, Claas

arXiv.org Artificial IntelligenceJan-28-2025

Multiple options exist to align pre-trained Large Language Models (LLMs) to better adhere to human preferences. Popular methods include Reinforcement Learning from Human Feedback (RLHF), which trains a reward model to act as a proxy for human feedback to rate model outputs, and Direct Preference Optimization (DPO), which eliminates an explicit reward model to represent human preferences, and instead, implicitly defines this in their loss function for fine-tuning. Both approaches heavily rely on pairwise human-annotated preference data that ranks model outputs. As an alternative method to alignment, Anthropic introduced Constitutional AI (CAI) [1], which offers a rule-based alternative to alignment based on a core set of principles/values called constitution. This set contains key ethical, moral, and safety standards that guide the outputs and promote desired behaviors through repeated critiquing of model outputs. Having an explicitly defined set of core values aids in the interpretability of the changes induced through the alignment procedure, as typical approaches like DPO or RLHF rely on an implicitly defined set of principles embedded in the pairwise preference data. Building on the idea of CAI, [2] proposed an Inverse Constitutional AI (ICAI) algorithm.

large language model, machine learning, natural language, (21 more...)

arXiv.org Artificial Intelligence

2501.17112

Genre: Research Report (0.83)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.90)
Information Technology > Artificial Intelligence > Representation & Reasoning > Rule-Based Reasoning (0.67)

Add feedback

Bringing AI Participation Down to Scale: A Comment on Open AIs Democratic Inputs to AI Project

Moats, David, Ganguly, Chandrima

arXiv.org Artificial IntelligenceJul-16-2024

This commentary piece reviews the recent Open AI Democratic Inputs programme, which funded 10 teams to design procedures for public participation in generative AI. While applauding the technical innovations in these projects, we identify several shared assumptions including the generality of LLMs, extracting abstract values, soliciting solutions not problems and equating participation with democracy. We call instead for AI participation which involves specific communities and use cases and solicits concrete problems to be remedied. We also find it important that these communities have a stake in the outcome, including ownership of data or models.

moat and ganguly 2024, participant, participation, (13 more...)

arXiv.org Artificial Intelligence

2407.11613

Country:

Europe > Finland > Uusimaa > Helsinki (0.04)
Asia > India > West Bengal > Kolkata (0.04)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)

Genre: Research Report (1.00)

Industry:

Law (0.68)
Government (0.47)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.95)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning > Generative AI (0.35)

Add feedback

Public Constitutional AI

Abiri, Gilad

arXiv.org Artificial IntelligenceJun-24-2024

We are increasingly subjected to the power of AI authorities. As AI decisions become inescapable, entering domains such as healthcare, education, and law, we must confront a vital question: how can we ensure AI systems have the legitimacy necessary for effective governance? This essay argues that to secure AI legitimacy, we need methods that engage the public in designing and constraining AI systems, ensuring these technologies reflect the community's shared values. Constitutional AI, proposed by Anthropic, represents a step towards this goal, offering a model for democratic control of AI. However, while Constitutional AI's commitment to hardcoding explicit principles into AI models enhances transparency and accountability, it falls short in two crucial aspects: addressing the opacity of individual AI decisions and fostering genuine democratic legitimacy. To overcome these limitations, this essay proposes "Public Constitutional AI." This approach envisions a participatory process where diverse stakeholders, including ordinary citizens, deliberate on the principles guiding AI development. The resulting "AI Constitution" would carry the legitimacy of popular authorship, grounding AI governance in the public will. Furthermore, the essay proposes "AI Courts" to develop "AI case law," providing concrete examples for operationalizing constitutional principles in AI training. This evolving combination of constitutional principles and case law aims to make AI governance more responsive to public values. By grounding AI governance in deliberative democratic processes, Public Constitutional AI offers a path to imbue automated authorities with genuine democratic legitimacy, addressing the unique challenges posed by increasingly powerful AI systems while ensuring their alignment with the public interest.

constitutional ai, legitimacy, onstitutional ai, (17 more...)

arXiv.org Artificial Intelligence

2406.16696

Country:

Europe (0.14)
North America > United States > California > Santa Clara County > Palo Alto (0.04)
North America > United States > California > San Francisco County > San Francisco (0.04)
North America > Canada > Ontario > Toronto (0.04)

Genre: Research Report (0.50)

Industry:

Law > Civil Rights & Constitutional Law (1.00)
Information Technology (1.00)
Leisure & Entertainment > Games (0.93)
(5 more...)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Artificial Intelligence > Issues > Social & Ethical Issues (1.00)

Add feedback

AI Governance and Accountability: An Analysis of Anthropic's Claude

Priyanshu, Aman, Maurya, Yash, Hong, Zuofei

arXiv.org Artificial IntelligenceMay-2-2024

As AI systems become increasingly prevalent and impactful, the need for effective AI governance and accountability measures is paramount. This paper examines the AI governance landscape, focusing on Anthropic's Claude, a foundational AI model. We analyze Claude through the lens of the NIST AI Risk Management Framework and the EU AI Act, identifying potential threats and proposing mitigation strategies. The paper highlights the importance of transparency, rigorous benchmarking, and comprehensive data handling processes in ensuring the responsible development and deployment of AI systems. We conclude by discussing the social impact of AI governance and the ethical considerations surrounding AI accountability.

anthropic, claude, transparency, (15 more...)

arXiv.org Artificial Intelligence

2407.01557

Country:

North America > United States > Pennsylvania > Allegheny County > Pittsburgh (0.04)
South America > Colombia > Meta Department > Villavicencio (0.04)
North America > United States > New York > New York County > New York City (0.04)
Europe > Ireland > Leinster > County Dublin > Dublin (0.04)

Genre: Research Report (1.00)

Industry:

Law (1.00)
Information Technology > Security & Privacy (1.00)
Government (1.00)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Issues > Social & Ethical Issues (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.47)

Add feedback

Informed AI Regulation: Comparing the Ethical Frameworks of Leading LLM Chatbots Using an Ethics-Based Audit to Assess Moral Reasoning and Normative Values

Chun, Jon, Elkins, Katherine

arXiv.org Artificial IntelligenceJan-9-2024

With the rise of individual and collaborative networks of autonomous agents, AI is deployed in more key reasoning and decision-making roles. For this reason, ethics-based audits play a pivotal role in the rapidly growing fields of AI safety and regulation. This paper undertakes an ethics-based audit to probe the 8 leading commercial and open-source Large Language Models including GPT-4. We assess explicability and trustworthiness by a) establishing how well different models engage in moral reasoning and b) comparing normative values underlying models as ethical frameworks. We employ an experimental, evidence-based approach that challenges the models with ethical dilemmas in order to probe human-AI alignment. The ethical scenarios are designed to require a decision in which the particulars of the situation may or may not necessitate deviating from normative ethical principles. A sophisticated ethical framework was consistently elicited in one model, GPT-4. Nonetheless, troubling findings include underlying normative frameworks with clear bias towards particular cultural norms.

alignment, moral reasoning, reasoning, (15 more...)

arXiv.org Artificial Intelligence

2402.01651

Genre: Research Report (0.82)

Industry:

Government (1.00)
Education (1.00)
Law > Statutes (0.50)
(2 more...)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Natural Language > Chatbot (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback